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Background: Sample size calculations are an important tool for planning epidemiological 
studies. Large sample sizes are often required in Mendelian randomization investigations. 
Methods and results: Resources are provided for investigators to perform sample size 
and power calculations for Mendelian randomization with a binary outcome. We initially 
provide formulae for the continuous outcome case, and then analogous formulae for the 
binary outcome case. The formulae are valid for a single instrumental variable, which 
may be a single genetic variant or an allele score comprising multiple variants. Graphs 
are provided to give the required sample size for 80% power for given values of the 
causal effect of the risk factor on the outcome and of the squared correlation between 
the risk factor and instrumental variable. R code and an online calculator tool are made 
available for calculating the sample size needed for a chosen power level given these 
parameters, as well as the power given the chosen sample size and these parameters. 
Conclusions: The sample size required for a given power of Mendelian randomization in- 
vestigation depends greatly on the proportion of variance in the risk factor explained by 
the instrumental variable. The inclusion of multiple variants into an allele score to explain 
more of the variance in the risk factor will improve power, however care must be taken 
not to introduce bias by the inclusion of invalid variants. 
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Key Messages 

• Resources are provided for investigators to perform sample size and power calculations for Mendelian randomization 
with a binary outcome. 

• The sample size required for a given power level is greater with a binary outcome than a continuous outcome, and is 
highly dependent on the proportion of the variance in the risk factor explained by the instrumental variable. 
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Introduction 

Sample size calculations are an important part of experimen- 
tal design. They inform an investigator of the expected power 
of a given analysis to reject the null hypothesis. If the power 
of an analysis is low, then not only is the probability of re- 
jecting the null hypothesis low, but when the null hypothesis 
is rejected, the posterior probability that the rejection of the 
null hypothesis is not simply a chance finding is low. 1 

Mendelian randomization is the use of genetic variants 
as instrumental variables for assessing the causal effect of a 
risk factor on an outcome from observational data. 2 
Genetic variants are chosen which are specifically associ- 
ated with a risk factor of interest, and not associated with 
variables which may be confounders of the association be- 
tween the risk factor and outcome. 3 Such a variant divides 
the population into groups which are similar to treatment 
arms in a randomized controlled trial. 4 Under the instru- 
mental variable assumptions, 5,6 a statistical association be- 
tween the genetic variant and the outcome implies that the 
risk factor has a causal effect on the outcome. 7 However, 
as genetic variants typically explain a small proportion of 
the variance in risk factors, the power to detect a signifi- 
cant association between the variant and outcome in an 
applied Mendelian randomization context can be low. 8 
Sample size analysis is particularly important to inform 
whether a null finding is representative of a true null causal 
relationship, or simply a lack of power to detect an effect 
size of clinical interest. 

Sample size calculations have been previously presented 
for Mendelian randomization experiments with continu- 
ous outcomes. Calculations based on asymptotic statistical 
theory have been presented with a single instrumental vari- 
able (IV), whether that IV is a single genetic variant or an 
allele score. 9 An allele score (also called a genetic risk 
score) is a single variable summarizing multiple genetic 
variants as a weighted or unweighted sum of risk factor- 
increasing alleles. 10 A simulation study for estimating 
power has also been presented with both single and mul- 
tiple IVs. 11 These approaches have shown good agreement. 
However, in many cases, the outcome in a Mendelian ran- 
domization experiment is binary (dichotomous), such as 
disease. In this paper, we present power calculations for 
Mendelian randomization studies with a binary outcome. 
We assume the context of a case-control study where the 
causal parameter of interest is an odds ratio, although the 
calculations are also valid for other study designs. 

Methods and Results 

We give results for the asymptotic variance of IV estima- 
tors with a single IV, and for the resulting sample size 



needed in a Mendelian randomization study to obtain a 
given power level. We initially present formulae with a con- 
tinuous outcome (this reviews material previously covered 
by Freeman et al. 9 ) and then analogous formulae with a bin- 
ary outcome. We concentrate on estimates from the ratio 
(or Wald) method, as this method makes few parametric as- 
sumptions, relying only on a linear relationship between the 
conditional expectation of the outcome (or in the binary 
case, the logistic function of the probability of the outcome) 
and the risk factor. 12 If the imprecision in the estimate of 
the genetic association with the risk factor is negligible, then 
estimates of power and sample size from the ratio method 
also correspond to those from assessment of the causal rela- 
tionship of the risk factor on the outcome by testing the as- 
sociation between the genetic variant and outcome. 

Other estimation approaches are possible with a binary 
outcome 13 but these either give equivalent estimates to the 
ratio method with a single IV (the two-stage predictor sub- 
stitution method 14 ) or are not recommended for general 
use in applied practice. These include the two-stage re- 
sidual inclusion method, due to inconsistency for a param- 
eter with a natural interpretation, 15 and the generalized 
method of moments (GMM) and structural mean models 
(SMM) methods, due to potential lack of identifiability of 
the causal parameter (S Burgess et al., unpublished data). 

Power with a continuous outcome 

With a single IV and a continuous outcome, the IV esti- 
mates from the ratio, two-stage least squares (2SLS) and 
limited information maximum likelihood (LIML) methods 
coincide. 16 The estimator can be expressed as the ratio be- 
tween the coefficient from the regression of the outcome 
(Y) on the genetic variant (G), divided by the coefficient 
from the regression of the risk factor (X) on the variant: 

ftv = fe (1) 
Pgx 

The asymptotic variance of this IV estimator is given by 
the formula: 

, - . var(Rv) 

where R Y = Y ~ X is the residual of the outcome on sub- 
traction of the causal effect of the risk factor, and Pq X is 
the square of the correlation between the risk factor X and 
the IV G. 17 The coefficient of determination (R 2 ) in the re- 
gression of the risk factor on the IV is an estimate of Pq X - 
The IV in these calculations could either be a single genetic 
variant or an allele score. 10 

The asymptotic variance of the conventional regression 
(ordinary least squares, OLS) estimator of the association 
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between the risk factor X and the outcome Y is given by 
the formula: 



var(ft 



var(RY) 
: Nvar(X)' 



The sample size necessary for an IV analysis to demon- 
strate a non-zero association for a given magnitude of causal 
effect is therefore approximately equal to that for a conven- 
tional epidemiological analysis to demonstrate the same 
magnitude of association divided by the Pq X value for the 
IV. 18 If the significance level is a and the power desired to 
test the null hypothesis is 1— ft then the sample size required 
to test a causal effect of size ft using IV analysis is: 9 



Sample size = 



(Z(i_2)+Z|j) var(R Y ) 



(4) 



var(X)ft 2 p 2 x 

where z is a quantile function, so that z a is the 100a per- 
centile point on the standard normal distribution. If the 
significance level is 0.05 and the power is 0.8, then the 
sample size to test for a change of ft standard deviations in 
Y per standard deviation increase in X is: 

7.848 



Sample size = 



(5) 



For a given sample size N, the power to detect a causal ef- 
fect (in the same direction the true effect) can be calculated as: 



3>(ApgxVN- z (1 _ f) ) 



(6) 



where <t> is the cumulative distribution function of the 
standard normal distribution. This is the inverse function 
of the quantile function (<t>(z a ) = a). 



We use these formulae to construct power curves for 
Mendelian randomization using a significance level of 
0.05. In Figure 1 (left), we fix the squared correlation p^ x 
at 0.02, meaning the variant explains on average 2% of the 
variance of the risk factor, and vary the size of the effect 
ft = 0.05, 0.1, 0.15, 0.2, 0.25, 0.3 and the sample size 
N = 1000 to 10 000. In Figure 1 (right), we fix the size of 
the effect at ft = 0.2 and vary the squared correlation 
pl x = 0.005, 0.01, 0.015, 0.02, 0.025, 0.03 and the sam- 
ple size as before. In each of the figures, the power to de- 
tect a positive causal relationship is displayed; this tends to 
0.025 as the sample size tends to zero. We see that the 
power increases as the causal effect increases, and as the IV 
explains more of the variance in the risk factor (the Pq X 
parameter or the expected value of the R 2 statistic 
increases). 

Similar formulae to these have been made available in 
an online tool for calculating either power for a given sam- 
ple size or sample size needed for a given power, taking the 
causal effect (ft) and squared correlation (pQ X ) param- 
eters, as well as the variance of the risk factor and out- 
come, and the observational (OLS) coefficient of the risk 
factor from regression on the outcome. 19 



Power with a binary outcome 

With a single IV and a binary outcome, the same IV esti- 
mator 1 as in the continuous outcome case can be eval- 
uated, except that a logistic model is typically used in the 
regression of the outcome on the genetic variant. 12 The 
asymptotic variance of this estimator can be approximated 




0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 

Sample size Sample size 

Figure 1. Power curves varying the sample size with continuous outcome and a single instrumental variable. Left panel: for a fixed value of the IV 
strength (p| x = 0.02) and different values of the size of the causal effect (/ii = 0.05, 0.1,..., 0.3). Right panel: for a fixed value of the causal effect 
= 0.2) and varying the size of the IV strength (p| x = 0.005, 0.01,..., 0.03) 
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using the delta method for the ratio of two estimates. 20 
The leading term in the expansion is: 



var(/? r 



_var(/i c 



(7) 



Although further terms from the delta method could be 
included, these are usually much smaller in magnitude. In 
the simulation example later in the paper, if the association 
between the risk factor and IV is estimated using data on 
the entire sample of control participants, the second and 
third terms in the expansion are two orders of magnitude 
smaller than the leading term (Figure 1). The asymptotic 
variance of the coefficient fi GY from logistic regression is: 



var(/J GY ) = 



1 



E[Z^?P(Y=l|G=fi)P(Y=0|G= ft )] 



(8) 



where i indexes individuals. This expression is obtained by 
differentiation of the log-likelihood. If the probability of 
an event does not depend greatly on the value of the gen- 
etic IV, then P>( Y = 1 1 G = &) sa P(Y = 1) which is the ratio 
of cases to participants in the sample. This approximation 
will be reasonable if the genetic variant does not explain a 
large proportion of the variance in the risk factor, and/or 
the effect of the risk factor on the outcome is not extreme. 
We assume (without loss of generality) that the mean of G 
is 0 and the variance is 1, so that E(£,- gf) = N, where N is 
the sample size. The square of the coefficient fi GX is ap- 
proximately equal to var(X) p GX . This gives: 



var(/J r 



1 



Nvar(X)p GX P(Y = 1) P>(Y = 0) 



(9) 



The sample size required to detect an effect of size fi\ per 
standard deviation increase in X for 80% power with a sig- 
nificance level of 0.05 is therefore 



Sample size 



7.848 



^ 2 p GX P(Y=l)P(Y=0) 



(10) 



where the effect fi\ is a log odds ratio. If there are to be 
an equal number of cases and controls, P(Y=1) = 
P(Y = 0) = 0.5, and: 

31.392 



Sample size = : , 

fiPGX 



(11) 



The corresponding power to detect a causal effect of size 
fix with a significance level of 0.05 is: 

f(j5iPGxV / (NP(Y=l)P(Y=0))-1.96). (12) 

Similar power curves to Figure 1 in the binary outcome 
setting are given in the Web Appendix (available as 
Supplementary data at IJE online). 



We use these approximations to calculate the number 
of cases needed to obtain 80% power in a Mendelian ran- 
domization analysis with a binary outcome for different 
values of fi\ and p GX , assuming a 1:1 ratio of cases to con- 
trols. The results are displayed in Figure 2. We note that 
when the genetic variants explain a small proportion of the 
variance in the risk factor, large sample sizes are required 
to detect even moderately large causal effects with reason- 
able power. 

An R 21 script for performing sample size and power cal- 
culations is provided in the Appendix (available as 
Supplementary data at IJE online). This code enables the 
calculation of the sample size required for a chosen power 
level given the values of fi\ and p GX , as well as the power 
given the values of fix, p GX and the chosen sample size. 
A calculator using this code is available online. 22 

Validation simulation 

In order to validate the estimates of sample size and power, 
we simulate data on a genetic variant, a continuous risk 
factor and an outcome. The data-generating model for in- 
dividuals indexed by i is: 



gi ~ N(0,1) 

Xi ~ N(g,p GX ,l - p GX ) 

y i ~ Binomial(l, expit(/i 0 + p\x;)) 



(13) 



where expit(x) = (exp(x) / l + exp(;e)) is the inverse of the 
logit function and fi\ is the log odds ratio per unit (which 
here equals 1 standard deviation) increase in the risk fac- 
tor. The genetic variant is modelled by a standard normal 
distribution; it can be regarded as a standardized weighted 
allele score. The parametric relationship between X, G and 
Pgx ensures that the proportion of variance in the risk fac- 
tor explained by the instrumental variable in a large sample 
is p GX - We also simulate data with a dichotomous risk fac- 
tor; details are given in the Web Appendix (available as 
Supplementary data at IJE online). 

We set fi 0 = —3 so that the outcome has a prevalence of 
about 5% in the population from which the case-control 
sample is taken. "We take three values of fi\ = 0.1, 0.2, 0.3, 
three values of p GX = 0.01, 0.02, 0.03, three sample sizes 
(10 000, 20 000, and 30 000 cases), and two values of the 
ratio of cases to controls (1:1 and 1:2). For each set of par- 
ameter values, we calculate the estimate of the power from 
equation (12) using a significance level of 0.05, and com- 
pare this with the number of times the 95% confidence 
interval for the ratio estimate excludes the null based on 
10 000 simulated datasets. 

The 95% confidence interval for the ratio method used 
in calculating the power of the simulation method is 
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Odds ratio per SD increase in risk factor Odds ratio per SD increase in risk factor 

Figure 2. Number of cases required in a Mendelian randomization analysis with a binary outcome and a single instrumental variable for 80% power 
with a 5% significance level and 1:1 ratio of cases:controls varying the size of causal effect [odds ratio per standard deviation (SD) increase in risk fac- 
tor, exp(/?i)] for different values of IV strength. Left panel: p^ x = 1%-8%. Right panel: p| x = 0.5%-3.0% 



constructed using Fieller's method, 23 and so does not rely 
on the same asymptotic assumption as the analytical 
method for estimating the power. Previous simulations 
have shown that confidence intervals from Fieller's method 
maintain nominal coverage levels even with weak instru- 
ments. 16 To obtain a case-control sample of the necessary 
size, we initially simulate data for a large number of indi- 
viduals, and then take the required number of cases and 
controls from this population. 

Simulation results 

Results from the validation simulation are given in 
Table 1. The Monte Carlo standard error (the expected 
variation from the true value due to the limited number of 
simulations) in the simulation estimates of power is at 
most 0.5%. The coverage levels of the 95% confidence 
interval from Fieller's method are close to 95% throughout 
(between 94.8 and 95.9 for the 54 scenarios). 

We note that estimates of power from the formula of 
equation (12) are similar to those from the simulation ap- 
proach. There is no apparent systematic bias in the esti- 
mates from the analytical formula, with simulation 
estimates being greater and less than those from the for- 
mula a similar number of times (when rounded to nearest 
0.1%, the estimate from the simulation was less 24 times 
and greater 19 times). Estimates from both approaches are 
no more different than would be expected due to chance 
alone. Similar results are obtained with a dichotomous risk 



factor; details are given in Web Table Al (available as 
Supplementary data at IJE online). In comparing estimates 
of power with equal numbers of cases, greater power is 
achieved when there is a casexontrol ratio of 1:2 than with 
a ratio of 1:1. However, when the total sample size is fixed, 
the estimate of power is greatest when the numbers of 
cases and controls are equal. This can be seen by compar- 
ing estimates with 30 000 cases and a ratio of 1:1, and 
with 20 000 cases and a ratio of 1:2. 

In response to concerns from a reviewer that the power 
estimates may not be valid with a discrete instrumental 
variable (such as a single nucleotide polymorphism) or 
when there is confounding, additional validation simula- 
tions were performed in these scenarios. Results are given 
in the Web Appendix (Web Tables A2-A4, available as 
Supplementary data at IJE online). No substantial differ- 
ences were observed from the validation simulation in the 
main paper when the instrumental variable was discrete. 
When there was confounding, estimates from the analyt- 
ical formula slightly overestimated power, particularly 
when the confounding was in the same direction as the 
causal effect. However, this overestimation was slight (on 
average less than 1 % when the confounding was in the op- 
posite direction, and less than 2% when the confounding 
was in the same direction). As the magnitude of confound- 
ing is not possible to estimate in applied practice, conserva- 
tive estimates of the correlation and causal effect 
parameters used in power calculations are recommended, 
particularly if confounding is thought to be substantial. 
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Table 1. Validation simulation to compare estimates of power in a Mendelian randomization analysis with a continuous risk fac- 
tor and a binary outcome from analytical formula and simulation study with a 5% significance level varying the size of causal ef- 
fect (ft), the IV strength {pq X ), the sample size and the ratio of cases to controls 



Case:control ratio = 


1:1 


10000 cases 




20000 cases 




30000 cases 








Formula 


Simulation 


Formula 


Simulation 


Formula 


Simulation 




ft = 0.1 


10.5% 


10.2% 


16.9% 


16.6% 


23.1% 


22.4% 


Pgx = 0 - 01 


ft = 0.2 


") Q TO/ 


~> 0 A 0/ 
loA A) 


5 1.6 /o 


C 1 TO/ 
M.Z/o 


SO Q 0/ 


£LC> CO/ 


ft = 0.3 


56.4% 


56.4% 


85.1% 


85.0% 


95.7% 


95.7% 




ft = 0.1 


16.9% 


17.2% 


29.3% 


28.9% 


41.0% 


41.1% 


n 1 — 0 07 


ft = 0.2 


51.6% 


51.0% 


80.7% 


80.2% 


93.4% 


93.6% 


ft = 0.3 


85.1% 


84.9% 


98.9% 


98.9% 


99.9% 


100.0% 




r — n 1 


7 3 1 °/„ 

Z.J). 1 A) 


77 9% 
z~j..y /o 


41.0% 


tU.o /o 


JO.t /o 


57 0% 




ft = 0.2 


68.8% 


68.5% 


93.4% 


93.3% 


98.9% 


99.0% 


ft = 0.3 


95.7% 


95.5% 


99.9% 


99.9% 


100.0% 


100.0% 


Casexontrol ratio = 


1:2 


10000 cases 




20000 cases 




30000 cases 








Formula 


Simulation 


Formula 


Simulation 


Formula 


Simulation 




ft = 0.1 


12.6% 


12.9% 


21.0% 


21.4% 


29.3% 


28.9% 


Pgx = 0.01 


ft = 0.2 


37.2% 


37.5% 


63.7% 


64.4% 


80.7% 


81.1% 


ft = 0.3 


68.8% 


68.2% 


93.4% 


93.3% 


98.9% 


98.8% 




ft = 0.1 


21.0% 


21.2% 


37.2% 


37.8% 


51.6% 


51.6% 




ft = 0.2 


63.7% 


63.9% 


90.4% 


90.7% 


97.9% 


97.9% 


ft = 0.3 


93.4% 


93.2% 


99.8% 


99.8% 


100.0% 


100.0% 




ft = 0.1 


29.3% 


29.0% 


51.6% 


51.4% 


68.8% 


68.8% 


P 2 GX = 0.03 


ft = 0.2 


80.7% 


80.8% 


97.9% 


97.7% 


99.8% 


99.9% 


ft = 0.3 


98.9% 


98.9% 


100.0% 


100.0% 


100.0% 


100.0% 



Discussion 

In this paper, we have provided information on sample 
sizes and power calculations in a Mendelian randomiza- 
tion analysis with a single IV and a binary outcome. We 
have shown in the continuous setting how the power de- 
pends on the magnitude of causal effect and the proportion 
of variance in the risk factor explained by the IV. With a 
binary outcome, the precision of the coefficient in the re- 
gression of the outcome on the IV is reduced compared 
with a continuous outcome, as the outcome can only take 
two values. As a result, the required sample sizes to obtain 
80% power are much larger. 

For a given applied example, the magnitude of the 
causal effect of a risk factor is fixed, as is the expected pro- 
portion of variance in the risk factor explained by each 
variant. However, the expected proportion of variance in 
the risk factor explained by the IV depends on the choice 
of IV. The required sample size for a given power level can 
be reduced (or equivalently the expected power at a given 
sample size can be increased) by including more genetic 
variants into the IV. This can be achieved by using multiple 
variants as separate IVs, 13 or as a single IV using an allele 



score approach. With an allele score, power can be further 
increased by the use of relevant weights for the variants. 10 
Provided that weights are not derived naively from the 
data under analysis, the allele score approach avoids some 
of the problems of bias from weak instruments resulting 
from using many IVs. 24 A disadvantage of the inclusion of 
many variants in an IV analysis, whether in a multiple IV 
or an allele score model, is that one or more of the variants 
may not be a valid IV. If a variant is associated with a con- 
founder of the risk factor-outcome association, or with the 
outcome through a pathway not via the risk factor of inter- 
est, then the estimate associated with this IV may be 
biased. If the function and relevance of some variants as 
IVs are uncertain, investigators will have to balance the 
risk of a biased analysis against the risk of an underpow- 
ered analysis. Sensitivity analysis may be a valuable tool to 
assess the homogeneity of IV estimates using different sets 
of variants. 

If there are missing data, this may adversely impact the 
power of an analysis. When there are multiple genetic vari- 
ants, individuals with sporadic missing genetic data can be 
included in an analysis using an imputation approach. 25 
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This can minimize the impact of missing data on the power 
of the analysis, particularly if the distributions of genetic 
variants are correlated (the variants are in linkage 
disequilibrium). 

The calculations in this paper make several assump- 
tions. The distribution of the IV estimator is assumed to be 
well approximated by a normal distribution. This is known 
to be a poor approximation when the IV is weak; 26 how- 
ever, if the IV is weak, then the power will usually be low. 
The standard deviation of this normal distribution is 
assumed to be close to the first-order term from the delta 
expansion. This term only involves the uncertainty in the 
coefficient from the genetic association with the outcome. 
The uncertainty in the estimate of the genetic association 
with the risk factor is not accounted for. Typically, this un- 
certainty will be small in comparison as the genetic associ- 
ation with the outcome is assumed to be mediated through 
the risk factor. Again, if this uncertainty is large, then the 
power of the analysis will usually be low. If a more precise 
estimate of the power is required, either further terms from 
the delta expansion could be used, or a direct simulation 
approach could be undertaken. The model of the logistic- 
transformed probability of an outcome event is assumed to 
be linear in the risk factor. As the power is very sensitive to 
the squared correlation term Pq X , it is advisable to take a 
conservative estimate of this parameter, or to perform a 
sensitivity analysis for a range of values of Pq X . Despite 
these approximations, the validation simulation suggests 
that estimates of sample size and power from the formulae 
in this paper will be close to the true values for a range of 
realistic values of the parameters involved. 

The ratio method used in this paper has been criticized 
for use with binary outcomes to estimate an odds 
ratio. 27 ' 28 This is due to the non-collapsibility of the odds 
ratio, meaning that the parameter estimate depends on the 
choice of covariate adjustment. 29 This is a general property 
of odds ratios, and not a specific feature of the ratio 
method. The estimate from the ratio method approximates 
a population averaged odds ratio, 15 and is close to a condi- 
tional odds ratio under certain specific circumstances. 30 
The choice of odds ratio estimate does not affect the con- 
sistency of the estimator under the null/ 1 As effect estima- 
tion is usually secondary to the demonstration of a causal 
effect, the precise identification of the parameter estimated 
by the ratio method is not of particular importance in 
Mendelian randomization analyses, and over-literal inter- 
pretation of Mendelian randomization estimates should be 
avoided even outside the odds ratio case. 32 

Although the sample sizes required in Mendelian ran- 
domization experiments are often large, it is not always ne- 
cessary to measure the risk factor on all of the participants 
in a study. Simulations have shown that, in some cases, 



90% of the power of the complete-data analysis can be ob- 
tained while only measuring the risk factor for 10% of par- 
ticipants. 33 This means that obtaining measurements of the 
risk factor, which may be expensive or impractical for a 
large sample, should not be the prohibitive factor for a 
Mendelian randomization investigation. 

Supplementary Data 

Supplementary data are available at IJE online. 
Conflict of interest: None declared. 
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